Bumblebees acquire alternative puzzle-box solutions via social learning

The astonishing behavioural repertoires of social insects have been thought largely innate, but these insects have repeatedly demonstrated remarkable capacities for both individual and social learning. Using the bumblebee Bombus terrestris as a model, we developed a two-option puzzle box task and used open diffusion paradigms to observe the transmission of novel, nonnatural foraging behaviours through populations. Box-opening behaviour spread through colonies seeded with a demonstrator trained to perform 1 of the 2 possible behavioural variants, and the observers acquired the demonstrated variant. This preference persisted among observers even when the alternative technique was discovered. In control diffusion experiments that lacked a demonstrator, some bees spontaneously opened the puzzle boxes but were significantly less proficient than those that learned in the presence of a demonstrator. This suggested that social learning was crucial to proper acquisition of box opening. Additional open diffusion experiments where 2 behavioural variants were initially present in similar proportions ended with a single variant becoming dominant, due to stochastic processes. We discuss whether these results, which replicate those found in primates and birds, might indicate a capacity for culture in bumblebees.


Introduction
The diversity of behaviours observed in some insect societies is on a par with, or exceeds that, of some mammals [1,2] and includes the construction of architecturally complex, climate-controlled nests, and the division of labour between foraging, brood care and nest defence [3,4]. Indeed, outside the human realm, their nesting structures are unparalleled in terms of their regularity, sophistication, and their scale in proportion to body size [5]. There is profound variation in foraging specialisations, architectures, and social organisations not just between related species of social insects, but more intriguingly, even within species [4,5] specialisations have historically been viewed as a limited set of preprogrammed responses to external stimuli, resulting from evolutionary trial-and-error processes [2], this innate repertoire is supplemented by a remarkable capacity for learning that has been recognised for decades. The acquisition of the honeybee dance language is, perhaps, the best-characterised example of social learning described thus far in an invertebrate [6], and as early as 1884, Charles Darwin suggested that "nectar-robbing" of flowers by bumblebees could spread socially. Here, a forager bites into the base of flowers to extract the nectar, but does not pollinate the plant [7]. That socially transmitted nature of this behaviour has since been confirmed: in the wild, nectar-robbing is thought to have repeatedly arisen as independent innovation events and to spread through local bumblebee populations via rapid social learning [8,9]. The phenotype-first theory of evolution, otherwise known as the Baldwin effect, describes how beneficial behavioural traits acquired during life might be passed on to offspring via selection that favours the acquisition of such behaviour, such as on learning ability or behavioural biases [10,11]. If a learned, beneficial behavioural innovation were to be maintained in a population via social learning, it seems likely that selection might act to favour variants that are more capable of learning the behaviour in question. The idea that what now appears merely instinctive may have originated via learning has the potential to explain the evolutionary origin of many complex behaviours. We here explore the possibility that social learning might, at least in theory, have contributed to the advent of unique behavioural innovations in social insects, using bumblebees (Bombus terrestris) as a model.
B. terrestris forms subterranean colonies that reach up to approximately 160 individuals on average [12], and in the absence of environmental hazards individual workers can survive up to 2 months post-eclosion [13]. Their colonies persist for a single season in temperate climates before collapsing and are survived only by new queens that leave to found new colonies in the subsequent spring [14]. In the laboratory, bumblebees have been shown to acquire both simple information such as flower colour choice [15,16] and relatively complex, nonnatural foraging techniques from other bees, such as string-pulling, in paired dyad paradigms [17]. However, the spread of such techniques has not yet been observed under the conditions that are considered the gold standard for demonstrating the spread of behaviours in groups of animals: the so-called open diffusion paradigm [18]. These experiments are of high ecological validity and involve the release of a trained demonstrator into a group of naïve observers, along with provision of the substrates necessary to perform the target behaviour.
To investigate social learning, a two-action control experimental design is helpful, as this can help exclude alternative explanations. Such designs have been used to great effect to investigate social learning in chimpanzees, with demonstrators trained to obtain food from a "panpipes" apparatus in 1 of 2 different ways, and observers acquiring their demonstrator's technique [19]. Meanwhile, without a demonstrator, no chimpanzees acquired either technique. More recently, in great tits, demonstrators were trained to open a puzzle box in 1 of 2 possible ways before seeding them back into wild populations [20]. The demonstrator's preferences spread throughout these groups and were maintained long term, even when the alternative behaviour was discovered and even though the 2 variants were entirely arbitrary. For the present study, we designed two-option puzzle box feeders informed by previous work on bumblebee problem-solving [21], which replicated those used to investigate the spread of arbitrary behavioural differences in great tit populations. We then developed an open diffusion protocol that allowed the spread of box opening through the group to be recorded, and seeded colonies of bumblebees with demonstrators trained to perform 1 of the 2 possible behavioural variants. These novel foraging techniques were successfully acquired by untrained bees via social learning.

Box-opening behaviour spread through bumblebee colonies under open diffusion conditions
To determine whether bumblebees could acquire and sustain cultural variation, we designed twooption puzzle boxes that could be opened by rotating a clear lid around a central axis by either pushing a red tab clockwise or a blue tab counter-clockwise (termed the "red-pushing behavioural variant" and "the blue-pushing behavioural variant," respectively) to expose a 50% w/w sucrose solution reward, as indicated by a yellow target (Fig 1A). Video files showing bees performing both behavioural variants are available in the Supporting information. Stages of the incremental training protocol developed for demonstrators are depicted in Fig 1B, and full details of this and the open diffusion protocol can be found in the Materials and methods. Demonstrators trained to perform the red-and blue-pushing behavioural variants will henceforth be referred to as "redpushing demonstrators" and "blue-pushing demonstrators," respectively. We conducted 3 experiments in total, with Experiments 1 and 2 involving the seeding of a single trained demonstrator into a population (and open diffusion experiments being conducted over 6 or 12 consecutive days, respectively). We also provided opportunities for bees to innovate and solve the boxes without social input, in control populations where no demonstrator was present. Experiment 3 involved the seeding of multiple demonstrators into a population (including 2 red-pushing demonstrators and 2 blue-pushing demonstrators) and was conducted over 12 consecutive days.
For the open diffusion experiments, 8 puzzle boxes were presented in the flight arena and all bees were allowed into the flight arena to freely interact with these boxes. Each day, bees received 30-min pretraining with lidless boxes, which left the yellow target fully accessible and allowed bees to form an association between the yellow target and the 50% w/w sucrose solution reward. This training period was followed by a 3-h experimental period, during which the puzzle boxes were closed and bees had to push on 1 of the 2 tabs to access the reward. Once each box had been opened and the sucrose reward consumed, it was removed from the arena. During this period, we monitored how often each box was opened and determined the identity of the bee responsible for each opening.
A total of 6 bumblebee colonies were used for the 6-day, single-demonstrator open diffusions of Experiment 1 (2 colonies seeded with a red-pushing demonstrator and 2 seeded with a blue-pushing demonstrator, and 2 control colonies). Untrained bees opened puzzle boxes in all 4 experimental colonies and a number of these bees (n = 14) met the learning criterion as defined in the Materials and methods ( Fig 1C). In contrast, in the 2 control colonies, only 1 individual opened a box, despite these colonies being of a comparable size to the experimental colonies. This bee did proceed to meet the learning criterion (Fig 1C), although it only opened boxes sporadically (n = 5 incidences in total; Fig 1D).
Bees that met the learning criterion in the presence of a trained demonstrator were significantly more proficient than those that did so without The emergence of this spontaneous learner in the absence of any demonstrator necessitated Experiment 2: single-demonstrator open diffusions run for 12 days instead of 6 to determine whether more spontaneous learners would emerge over a longer period. A total of 4 colonies were used for these experiments (2 experimental colonies, with 1 seeded with a red-pushing demonstrator and the other with a blue-pushing demonstrator, and 2 control colonies). Untrained bees from all 4 colonies, experimental and control, reached the criteria to be considered learners (Table 1). Remarkably, the highest number of individuals met learning criteria in colony C4 (n = 9 versus n = 4 in each experimental colony).
Although box opening could evidently arise spontaneously in the absence of social learning, we found that bees that learned from demonstrators were significantly more proficient at box opening than learners from control colonies (Fig 1D). We pooled learners from Experiments 1 and 2 and calculated individual proficiency indices to allow comparison between bees that met the learning threshold at different points in the diffusion (and thus had more or less chance to accumulate box openings). These indices were calculated as follows: total individual opening incidence/total days spent as a learner, inclusive of the day the learning threshold was met. The difference between learners from experimental and control colonies was significant, with learners from experimental colonies opening significantly more boxes than spontaneous learners from control colonies (median boxes opened per day after reaching learning threshold: experimental colonies, 27.9 (IQR = 2.25 to 60.6); control colonies, 1.15 (IQR = 0.69 to 3.42); Mann-Whitney U test: W = 253, p = 0.001; Fig 2A). This individual-level difference in  Learner proficiency is higher in bees exposed to a demonstrator compared with those without and increases more rapidly. (A) Difference in proficiency between learners from experimental and control colonies. Proficiency indexes were calculated for each individual learner as follows: total incidence of box opening/days spent as a learner, with the latter spanning from the day learning criteria was met to the end of the diffusion. Control, n = 14, including data from colonies C2-4; experimental, n = 22, including data from colonies R1-3 and B1-3.) �� p < 0.01 (Mann-Whitney U test). (B) Change in learner proficiency from the day learning criteria were met ("day 1") to 2 days later ("day 3") in control and experimental diffusion experiments. Learners who did not meet the learning criteria early enough to have data for a third day (e.g., those who met criteria on day 5 or 6 of the 6-day colonies or on day 11 or 12 of the 12-day colonies) were excluded from analysis. Control group, n = 11; experimental group, n = 18. The data underlying this figure can be found in https://doi.org/10.6084/m9.figshare.21353973.
https://doi.org/10.1371/journal.pbio.3002019.g002 proficiency translated into colony-level differences in the frequency of box opening (median boxes opened per day by learners, experimental colonies: 76.8 (IQR = 45.4 to 83.2); control colonies, 2.1 (IQR = 0.6 to 8.1); Mann-Whitney U test: W = 28, p = 0.006). This suggested that, although social learning was not required for a bee to perform box-opening behaviour, it was necessary for the behaviour to become fixed in an individual's (or population's) repertoire.
Next, we sought to analyse the effect of demonstrator presence on learner proficiency over time. To achieve this, we compared the incidence of box openings on the day an individual met learning criteria (termed "day 1") and 2 days later (termed "day 3"; Fig 2B). Analysis with a linear mixed-effects model identified a statistically significant interaction between treatment (exp versus ctrl) and day (day 1 versus day 3), with bees in the experimental group experiencing a greater increase in proficiency on day 3 compared with day 1 than bees in the control group (F = 10.152, df = 1, p = 0.003; S2 Table). This confirmed that the presence of a demonstrator led to increased proficiency and more persistent behaviour among learners in the experimental colonies, indicating that the presence of social information was crucial to the long-term persistence of novel behaviours and behavioural variants within the population.

When exposed to demonstration of a single arbitrary behavioural variant, learners acquired a strong preference for this variant that persisted even when the alternative behaviour was discovered
Puzzle boxes could be opened in 1 of 2 ways: by pushing either the red tab clockwise or the blue tab anticlockwise. In every experimental colony, the behavioural variant performed by the demonstrator became dominant among the learners ( Table 2). This preference remained even when learners discovered the alternative behaviour: more than half the learners from the experimental colonies performed the non-demonstrated variant at least once (12 out of 22 bees; S3 and S4 Tables). Nonetheless, individual learners from experimental colonies were significantly more likely to perform their taught variant than the alternative (median proportion of box openings made using the taught variant, 1.0 (IQR = 1.00 to 0.99; taught variant

C4
12 n/a n/a n/a n/a 269 22.42 5.9 94.1 1 Demonstrator data include incidences of box opening by trained demonstrators only. 2 Box openings were assigned to observers only when they pushed a tab �50% of the required distance to open the box and obtained the reward. 3 Mean prevalence of the demonstrated behavioural variant across all experimental colonies. Colonies B1, B2, and B3 were each seeded with a "blue-pushing demonstrator." Colonies R1, R2, and R3 were each seeded with a "red-pushing demonstrator." Colonies C1, C2, C3, and C4 were controls that lacked a demonstrator.  (Table 2). Learners from control colonies had no taught variant but preferred to open boxes using the blue-pushing variant (median proportion of box openings made using the blue-pushing variant, 0.99 (IQR = 1.00 to 0.78). While this preference was not as strong at the colony level as in the experimental colonies (C2, 60.0%; C3, 87.8%; and C4, 94.1% prevalence; Table 2), it did suggest an innate inclination to perform the blue-pushing behavioural variant, an unsurprising result in light of the well-known preference for the colour blue among bumblebees [22,23]. There was also no significant difference between the strength of preference between control and experimental colonies: control learners preferred their favoured variant just as strongly as experimental learners (Mann-Whitney U test, W = 379, p = 0.164). The fact that learners exposed to a red-pushing demonstrator showed just as strong a preference for the red-pushing variant as control learners did for their preferred variant, even in the face of a natural inclination towards the blue-pushing variant, provides further evidence suggesting that social learning is key to the transmission of puzzle-box opening. In short, a learned preference for red can overcome an innate preference for blue.
Overall, the results of Experiments 1 and 2 suggest that spontaneous learners from control colonies are less proficient at box opening than social learners, opening fewer boxes in total. Social learners acquired strong preferences for the demonstrated variant even though more than half tried the alternative variant. This translated to greater proficiency and strong preferences for a single variant at the colony level.

Box-opening behaviour persisted among learners in experimental colonies but was vulnerable to collapse in the controls
Experiment 2 also provided some insight as to the likelihood of box-opening behaviour persisting over time and/or generations of learners. Fig 3 depicts the daily incidence of boxopening behaviour by observers throughout all single-demonstrator diffusions (full data is presented in S1 Fig and S1 Table). The incidence of box opening by observers was significantly positively correlated with experimental day in 5/6 experimental colonies, while there was no significant correlation in 3/4 control colonies (Spearman's rank order correlation tests; Fig 3 and S1 Table). The control colonies also appeared to be subjected to greater fluctuation: While peaks and troughs in incidence do occur in the experimental colonies, boxpushing still occurred relatively frequently on these days compared with control colonies. In colony C3, box opening appears to arise on 2 separate occasions (day 3 and day 9), separated by a complete collapse on day 6. Our diffusion experiments were run for 12 days at the maximum and did not include defined generations of learners, but the results do indicate that behaviours arising in control colonies are less likely to persist over time compared with those in the experimental colonies, even when the number of learners is comparable. Taken together, these results suggest that it was social learning that underpinned the spread and maintenance of box opening and its variants in the experimental colonies under open diffusion conditions.

When exposed to both behavioural variants in roughly equivalent proportions, one behavioural variant became dominant in each population
The experiments described in this article thus far (and, indeed, most open diffusion experiments conducted in the literature) begin with arbitrary local variation already established (i.e., demonstrators are trained to perform a single behavioural variant approximately 100% of the time and the alternative is left to be discovered serendipitously, if at all). Such experiments are usually conducted to see whether fidelity to a particular learned behaviour would degrade over time or be sustained. Less work, comparatively, has been done on how such a local behavioural variation might emerge in the first place within a population. Thus, we conducted an additional experiment aimed at investigating what might happen if 2 behavioural variants were initially present in a population. Experiment 3 involved the seeding of multiple demonstrators into a population and used the same puzzle boxes and training protocol as Experiments 1 and 2. However, the flight arena was modified to permit the presentation of 16 puzzle boxes and the simultaneous exposure of 2 colonies, providing a larger population of foragers (Fig 4A). A total of 4 demonstrators were seeded into the population, with 2 trained to perform the redpushing behavioural variant and 2 the blue, and the diffusion was conducted for 12 days. This experiment was run twice, with the replicates termed population 1R2B2 and 2R2B2. In each replicate, demonstrators opened boxes using the 2 behavioural variants in close to equal proportions (proportion of all openings made with blue variant: 1R2B2, 0.550; 2R2B2, 0.564; S3  Fig and S5 Table).
A total of 18 observers met the learning threshold in population 1R2B2, with 9 of these performing box-opening behaviour >10 times and thus being termed proficient learners. In 2R2B2, 8 observers met the learning criterion, and 4 of these were classed as proficient learners Colonies R1, R2, and R3; and B1, B2, and B3 were seeded with single demonstrators trained to perform the red-pushing behavioural variant and the blue-pushing behavioural variant, respectively. Colonies C1, C2, C3, and C4 were controls that lacked a demonstrator. A box opening was assigned to an observer whenever it pushed on either tab �50% of the required distance and obtained the reward. Incidences of the "bluepushing behavioural variant" are depicted in blue, while incidences of the "red-pushing behavioural variant" are depicted in red. Data for the whole colonies and demonstrator activity can be found in S1 Fig and S1 Table. Relationships between experimental day and opening incidence (of any variant) by observers were analysed using Spearman's rank order correlation tests; � p < 0.05, �� p < 0.01, and ��� p < 0.001. The data underlying this figure can be found in https://doi.org/10.6084/m9.figshare.21353973.
https://doi.org/10.1371/journal.pbio.3002019.g003 (Fig 4B and Tables 3 and S7). Notably, by day 12 of diffusion, the red-pushing behavioural variant became the dominant technique among observers in population 1R2B2, while the bluepushing behavioural variant predominated in population 2R2B2 (Fig 4C and 4D and S6  Table). Strikingly, all demonstrators in this population had ceased their activity by day 6, due either to dying or taking up different jobs within the colony, meaning that all box opening during the remaining 6 days was performed by observers alone (S5 Table). Notably, 4 new individuals reached the learning threshold after day 6 in this colony, although we cannot confirm that they did not observe the initial trained demonstrators while these were active (Table 3). This apparently high demonstrator drop-off rate was likely due to our selection criterion: we aimed to choose only the "best," most reliable foragers as demonstrators to speed up the training process. Bumblebee workers tend to perform jobs within the hive while young, and only begin foraging later in life, if at all [14]. As more "skilled" foragers may have been doing so for some time, our chosen demonstrators likely tended to be older than the rest of the colony and so were more likely to die during the experiment.
In population 1R2B2, until day 8, the predominance of the red-pushing and blue-pushing behavioural variants fluctuated back and forth somewhat evenly (Fig 4C and 4D). However, from day 9 onwards, the red-pushing behavioural variant became increasingly predominant, and by day 12, 97.3% of the 263 incidences of observer box-opening behaviour were of the red-pushing variant. In population 2R2B2, observers preferred the bluepushing variant over the red on all days except day 1, where 16/18 box-opening incidences by observers were of the red variant. However, on days 11 and 12, there was a sharp drop in observer activity (Fig 4C and S6 Table), despite no apparent fall in box-opening overall (S3A Fig and S6 Table). This was driven by the activity of 2 trained demonstrators, which both remained active on day 12.  Asterisks represent the variant preference (following the figure for the preferred variant) and the strength of preference. �� Indicating a strong preference. � Indicating a weak preference.
As the sample sizes for non-proficient learners were small, no preferences or strength of preferences were assigned. https://doi.org/10.1371/journal.pbio.3002019.t003

When faced with demonstrations of both variants, proficient learners tended to form preferences for a single variant
Proficient learners, who performed box-opening behaviour >10 times, formed strong individual preferences for a single variant with few exceptions (median proportion of box openings made using the preferred variant, 0.84 (IQR = 0.97 to 0.73; preferred variant incidence versus non-preferred variant incidence: Wilcoxon signed-rank test, V = 91, p < 0.001 versus chance level). Notably, in both populations, some individuals developed a preference for the bluepushing variant and others for the red-pushing variant. Even though learners generally performed both behavioural variants during the experiment, once a learner developed a preference for either variant, they maintained this preference and showed no evidence of switching variants (Fig 5): There was no significant difference between variant preference between the first day of learning and the last day of recorded activity (linear mixed-effects model, F = 0.626, df = 1, p = 0.4377; S8 Table). In 1 case (b18), a preference for the red behavioural variant remained even after a 2-day pause in foraging activity (Fig 5). Shifts in colony-level preference appeared to be due to the emergence of new learners (that developed their own individual preference) and the cessation of foraging activity by individuals with established preferences, rather than currently active foragers changing their behaviour to match the current majority.

Discussion
The results of the present study provide strong evidence that social learning underpins the transmission of novel foraging behaviours in bumblebees, but most importantly that these behaviours (and arbitrary variants of these) can spread through groups of bumblebees under open diffusion conditions. Our results are particularly notable because they echo those found previously in chimpanzees and great tits, using similar two-action control designs and open diffusion paradigms [19,20]. Furthermore, as with the great tits, in the experimental singledemonstrator diffusions, some untrained observers discovered the alternative, non-demonstrated technique, but reverted to the demonstrated variant. This suggests that although bees were able to generalise their behaviour and were not entirely ignorant of the alternative technique, they still preferred the variant that was demonstrated to them. Similar results were also found in vervet monkeys: subordinate monkeys who are prevented from accessing a demonstrated food type, and thus have extensive experience with an equally palatable alternative, revert back to a demonstrated food choice whenever they have the chance [24,25]. The similarities are striking and may suggest the existence of similar learning strategies. To our knowledge, this is the first demonstration of the rise, spread, and maintenance of arbitrary behavioural variation under such conditions in any invertebrate.
The two action-control design confirmed that social learning was involved in the spread of box-opening behaviour, even though some individuals opened puzzle boxes without a demonstrator. The colour red sits at the very edge of the bee visual spectrum, while blue is among their most preferred colours [22,23]. This strength of preference for blue over red suggests that naïve bees should be drawn disproportionately towards the blue tab, simply because its colour is a highly salient feature to them. Indeed, when bees opened boxes in the control diffusions, their preference was for the blue behavioural variant. However, bees exposed to a demonstrator trained to perform the red-pushing behavioural variant were no less proficient than those exposed to a one that performed the alternative, and their preference for their demonstrators' behavioural variant was no weaker. Even when the alternative blue-pushing behavioural variant was discovered by these learners, they still reverted to their demonstrator's red-pushing technique.
Furthermore, control learners opened fewer boxes compared with those from the experimental diffusions, despite these learners needing to compete with a highly proficient trained demonstrator for boxes. If bees from the control diffusions had learned in the same way as those in experimental diffusions, they should have been able to accumulate more openings due to lack of competition. Bees in the control diffusions should have been more driven to forage: For the full 3 h of the daily open diffusion, the experimental colonies enjoyed an influx of high-value 50% w/w sucrose solution from successful foragers while the control colonies received little or none.
The daily pretraining ensured that bees in both conditions knew that the yellow target on the puzzle box indicated a reward. When they could no longer access this during the open diffusion, but could still see the yellow target through the lid, some bees in the control colonies began to investigate the target and other components of the box. Eventually, this would inadvertently move either the blue or red tab closer to the target. If this behaviour was undirected or very incremental, perhaps it was simply not possible for bees to form an association between tab-pushing and it with the reward. In contrast, bees in the experimental colonies were repeatedly drawn towards relevant features of the box via the presence of the demonstrator and, thus, local enhancement. This may have resulted in their own attempts to open boxes being more directed. Alternatively, the constant, repeated reinforcement of the tab being juxtaposed with the target, as in opened boxes, may have permitted the association to form via stimulus enhancement.
Bees in the control colonies that met the learning criterion may well have experienced some degree of social input: once 1 bee opened a box, it might have acted as a demonstrator for others. But if these initial learners and those that followed them were not proficient, subsequent learners may not have received enough social input. The trained demonstrators were highly proficient and regularly opened boxes more than 100 times a day (Tables 2 and S1). In contrast, the most proficient control learner, bee y12 from colony C4, opened boxes just 68 times on its most successful day (S5 Table). This particular bee was an outlier with 216 box openings recorded during the experiment (for scale, its closest rival among the control learners opened boxes just 22 times in total; S5 Table). From this, it seems clear that it was the presence of the demonstrator that caused box-opening behaviour to become fixed in an observer's repertoire, and a certain number of demonstrations were needed to secure the association between action and reward.
In the multiple-demonstrator diffusion experiments, most proficient learners acquired a preference for 1 of the 2 possible behavioural variants, and once developed these seemed to be fixed. Thus, the predominance of red-and blue-pushing behaviour was largely governed by stochasticity associated with experienced individuals retiring from foraging and new learners arising. As with all stochastic processes, it is possible that in larger groups, the establishment of stable local variations in behaviour would be slower or even attenuated altogether. This persistence of individual preferences may represent a strategy to decrease cognitive load through habit formation [26] or an avoidance of costs associated with switching from an old behaviour that one has become proficient in to a new behaviour [27][28][29]. While the exact mechanisms underlying the development of individual behavioural preferences in bumblebees remain unclear and warrant further investigation, it does seem likely that subsequent learners in this experiment would have continued to acquire a preference for the red-pushing behavioural variant. The strength of bias towards this variant on days 10 to 12, in Experiment 3, was comparable to that observed in the experimental single-demonstrator diffusions, and learners in these experiments robustly acquired their demonstrator's preference. However, further work would need to be done to confirm this.
The similarities between our results, found in bumblebees, and those found in primates [19,24] and birds [20] using similar paradigms, are noteworthy because these previous studies argued to demonstrate the capacity of these species for culture. Culture is broadly defined as the sum of a population's behavioural traditions, which are in turn defined as socially learned behaviours that persist within a population over time and/or generations: that is, the persistence of the behaviour within the population is key, as is its spread between multiple individuals [30][31][32][33]. Behaviours that meet the prerequisites to be considered traditions have since been found to occur naturally in a number of animals. These include tool selection in chimpanzees [34], song dialects in birds [35,36], and feeding techniques in humpback whales [37]. However, the majority of these studies focus on vertebrates with relatively large brains. Whether bumblebees are likely to develop similar behavioural traditions naturally remains an open question.
The lifespan of individual B. terrestris workers is brief, and colonies collapse before the winter [13,38]. Although there may be multiple, sequential sets of workers present during that time, these do not represent true biological generations, and if no workers survive past the decline of annual colonies at the end of the season, any foraging traditions should be lost with them. Thus, it seems unlikely that B. terrestris would build cultures that span biological generations in the wild, but the results reported here support the notion that the cognitive capacities for this to occur are in place. Nectar-robbing in the wild might represent a form of temporary culture, if it indeed arises via innovation followed by rapid local spread by social learning [9]. The occurrence of multiple separate innovation events does not preclude a behaviour from being cultural: All culture requires the capacity for behavioural innovation in the first place, so its existence alone cannot act as evidence against culture.
The Baldwin effect might also provide a more roundabout route for culture that spans biological generations in bumblebees, if selection favours those queens whose worker progeny are more likely to develop such traditions. However, there are social insect species that form colonies that last for years, and it might be such cases where cultural behavioural phenomena might be most rewardingly studied. These include honeybees [39], certain tropical bumblebee species including Bombus medius [40], Bombus atratus [41], Bombus rufipes [42], and stingless bees [43,44]. If the learning abilities of these species resemble those of B. terrestris, it seems plausible that such culture might be found naturally among them.
Even if culture is rare or nonexistent in extant wild social insects, the mere existence of this capacity in an invertebrate (given the right conditions and opportunity, as provided here) makes it plausible that cultural processes may have contributed to the emergence of behavioural foraging specialisations, nesting architecture, and colony organisation. Perhaps elements of the vast, complex repertoire of innate behaviours seen in social insects were not always so instinctive [11]. The reason that we have often failed to see evidence of culture in some nonhuman animals may be that we are simply looking too late [5]. Social insects may represent an exciting model to investigate these hypotheses in the future.

Animal model
Colonies of bumblebees (Bombus terrestris audax) were obtained from Agralan, (Swindon, United Kingdom) or Koppert Biological Systems Nederland (Berkel en Rodenrijs, the Netherlands). Bees were housed in 30.0 × 14.0 × 16.0 cm bipartite wooden nest boxes, and all individuals were marked with numbered Opalith tags for individual identification during transfer to these nest boxes. This involved trapping each bee in a small cage, gently pressing it against the mesh with a sponge, and affixing the tag to the dorsal thorax with a small amount of glue. The nest boxes were connected to flight arenas (66.0 × 60.0 × 30.0 cm or 132.0 × 60.0 × 30.0 cm for single-and multiple-demonstrator experiments, respectively) via 26.0 × 3.5 × 3.5 cm clear acrylic tunnels, which could be blocked to limit access to the flight arena. Bees were allowed to forage ad libitum on 20% w/w sucrose solution provided in mass feeders in these arenas overnight, and pollen was provided every 2 days. Colonies were maintained at standardised room temperature throughout the study, and experiments were conducted under standardised artificial lights

Experimental set-up and puzzle box design
Puzzle box. The two-option puzzle box (Fig 1A) incorporated a rotating, transparent lid that could turn either clockwise or anticlockwise around a central axis, effectively providing 2 possible ways to "open" the box and obtain a reward; in this case, 50% w/w sucrose solution placed on a yellow "target." This target was always visible due to the transparent lid, but inaccessible without rotating the lid either clockwise or anticlockwise by pushing a red tab or a blue tab, respectively. Other than the direction of rotation and tab colour, there was no difference between the 2 box-opening behavioural variants, which will henceforth be referred to as the "red-pushing behavioural variant" and the "blue-pushing behavioural variant." A stopper attached to the edge of the boxes prevented bees pushing further onwards after accessing the reward, and a plastic strip around the circumference acted as a "shield" to prevent bees obtaining the reward by reaching with their proboscis from the side.
Flight arena. Experiments were conducted in specially designed flight arenas (see Figs 1A and 4). Flaps were cut into the side of the arena through which the puzzle boxes could be removed and replaced, with minimal disturbance to the bees inside the arena. Brush strips lined the inside of the flight arena to prevent bees from escaping during this process. The top of the flight arena was a sheet of transparent UV-transmitting acrylic sheet, and cameras were placed on top of the arena so that the bee ID tags could be captured while recording the diffusion experiments. The flight arenas for the single-demonstrator diffusion experiments had space for 8 puzzle boxes (Fig 1A), while those used for the multiple-demonstrator diffusion experiments were expanded to provide space for 16 puzzle boxes (Fig 4). This was done to reduce competition for boxes. To ensure an adequate supply of naïve foragers, the arena was also made bipartite to allow the inclusion of 2 colonies in a single experimental population. The 2 nest boxes were connected to opposite ends of the flight arena with a central divider positioned between them. This was removed during the diffusion experiments, but was reinstated to allow the 2 colonies to forage in separate flight arenas overnight. No inter-colony aggressive interactions were observed during the experiment, and care was taken to return bees to their original colonies after each diffusion session.
Demonstrator training protocol. Potential demonstrators were identified during initial group foraging on yellow acrylic chips, which were placed in the flight arena and loaded with 50% w/w sucrose solution. When a bee was observed repeatedly and reliably coming back and forth between the nest box and the flight arena to forage, it was selected for further training. All other bees were restricted to the nest box, and the chosen individual was subjected to an incremental training protocol to learn how to open the puzzle box in 1 of the 2 possible ways, either by pushing the red tab clockwise (S1 Video) or the blue tab anticlockwise (S2 Video and Fig 1B). The reward used throughout training was 10 μl 50% w/w sucrose solution, and boxes were wiped with 70% ethanol each time they were refilled to remove any olfactory cues.
Training began with the puzzle boxes presented in the "fully open" position, with the yellow target completely exposed and the tab of the selected colour positioned closest to it (Fig 1Bi shows the starting position for training a bee to push the blue tab). The position of the tab was then modified over time, rotating further and further over the yellow target until it was inaccessible without the bee pushing against the tab. At this point, the bees would inadvertently move the tab forwards while probing the gap beneath it with the proboscis, apparently on an incidental basis. However, as training continued, this behaviour became noticeably more directed. To prevent bees losing motivation, if they failed to access the reward at any point, the reward in the next box was made easier to access.
Training continued until the 2 tabs were almost equidistant from the yellow target, with the trained tab being approximately 1.0 cm closer. At this point, the bee progressed to a learning test, which consisted of a box with the 2 doors equidistant from the yellow target and distilled water in place of the sucrose solution reward. Bees were permitted to leave the flight arena after 5 min, but if the box remained unopened this was considered a failed test. If the box remained unopened after 10 min, the test also ended in failure, and individuals who failed the test were returned to training until they met test criteria again. However, if the bee opened the box successfully within the time limit, it was used as a demonstrator for the open diffusion experiments.
Demonstrators were trained using the same protocol for both the single-and multipledemonstrator diffusion experiments. The sole difference was that for the multiple-demonstrator diffusion experiments, 4 demonstrators were trained over 2 days (2 to perform the redpushing behavioural variant and 2 to perform the blue-pushing behavioural variant). The 2 "red demonstrators" were trained together and the 2 "blue demonstrators" were trained together to save time. The reason 2 demonstrators were trained for each variant was simple: trained demonstrators might, on occasion, die before a diffusion experiment had commenced. As the main point of the experiment was to have both behavioural variants being demonstrated simultaneously with as close to equal incidence as possible at the start, it was decided that 2 should be trained to perform each variant.
Single-demonstrator open diffusion protocol. Colonies seeded with a demonstrator trained to perform the red-pushing behavioural variant will henceforth be referred to as "red colonies," and those seeded with a demonstrator trained to perform the blue-pushing behavioural variant will be termed "blue colonies." Experiments were conducted for either 6 or 12 consecutive days; 6 days to provide proof of concept and 12 to see whether box-opening behaviour would persist in a group for longer. Each day at approximately 9.30 AM, the mass feeders were removed from the flight arena and bees were returned to the nest box. If more than 2 honeypots were full, the sucrose solution was removed to ensure a strong motivation to forage.
After approximately 30 min, bees were allowed unrestricted access to the flight arena, where they received 30 min group pretraining with 8 lidless boxes, with the yellow targets (bearing 10 μl 50% w/w sucrose solution rewards) fully exposed. The absence of the lid prevented bees from making associations between either tab colour and the reward during this time, which was primarily to ensure that the bees maintained a strong association between the colour yellow and the reward, and to encourage as many into the flight arena as possible before the proper diffusion began (taking advantage of the natural honeypot monitoring and food alert behaviours of bumblebees [45,46]). Following this, the boxes were removed, wiped with 70% ethanol, and the targets were refilled with 20 μl 50% w/w sucrose solution. This doubling of the reward volume ensured that the demonstrator would still be sufficiently rewarded if observers began "scrounging"; taking the reward from boxes opened by the demonstrator without opening any themselves. This behaviour was commonplace during pilot studies, so this measure was taken to prevent the demonstrator losing motivation to forage.
The diffusion experiment commenced immediately following pretraining, and the bees were presented with 8 fully closed boxes for 3 h. The experimental colonies were seeded with a trained demonstrator while control colonies lacked one, but all other aspects of the protocol were identical. When boxes were depleted of sucrose solution, they were removed from the flight arena through the flaps, washed with 70% ethanol, refilled, and then replaced. In the single-demonstrator diffusion experiments, if the demonstrator attempted to perform the behavioural variant it had not been trained on (e.g., attempting to open the box by pushing the blue tab anticlockwise when it was initially trained to push the red tab clockwise) it was prevented from doing so, with the experimenter holding the lid of the box closed with tweezers. No other bees were inhibited from performing either behavioural variant in any way. This was done to try and maintain as much consistency between demonstrators as possible, so the experimental replicates would be more comparable. However, demonstrators attempted the alternate behaviour very rarely (see S1A Table).
A total of 6 colonies were used for the 6-day diffusions, 2 of which were seeded with a demonstrator trained to perform the red-pushing behavioural variant (R1, R2), 2 seeded with a demonstrator trained to perform the blue-pushing behavioural variant (B1, B2), and 2 control colonies (C1, C2). A further 4 colonies were used for the 12-day diffusions (B3, R3, C3, and C4), following these naming conventions. In the event that a demonstrator died before any observers met learning criteria, a new bee was selected for individual training. Pretraining and 3 h open diffusion immediately followed a successful learning test. This situation occurred on both days 1 and 2 for colony R3. However, if a demonstrator died after an observer met learning criteria, it was not replaced. The 3 h open diffusions were filmed from above using iPhones (30 fps, 720p; 1 camera per 2 boxes to ensure number tags were clear).
Multiple-demonstrator open diffusion protocol. The multiple-demonstrator diffusion protocol was identical to the single-demonstrator diffusion protocol, with 2 exceptions. First, no bees were restricted from performing either behaviour. This would have been too difficult to achieve with 4 individuals at once, and it was possible that the demonstrators might change their own trained preferences during the experiment as part of a local tradition becoming established (or not). In any case, as in the single-demonstrator diffusion experiments, the demonstrators rarely attempted to perform the alternative, non-trained technique (S1B Table). The second change was that, due to the free-form nature of this experiment and the presence of multiple demonstrators for each variant, demonstrators were not replaced if they died. Two colonies were used for this experiment, and were combined to form a single population as aforementioned.

Video analysis
Video analysis was conducted in BORIS 7.10.2 [47], and point events were coded whenever a box was opened. In some cases, the tabs would initially be pushed by a bee that would leave before opening the box fully; thus, each opening was assigned to the ID of the bee responsible only when they pushed a tab �50% of the required distance to obtain the reward. In cases where observers obtained the reward by pushing <50% of the required distance, these opening instances were included at the overall colony level but were not assigned to an individual; thus, these were absent from observer-specific data. This ensured that all openings assigned to observers involved directed, sustained pushing at the tab, and so unlikely to be by chance.

Learning criteria
Untrained bees were considered to have made the transition to learners when they had performed full box opening twice, irrespective of behavioural variant, as this repetition of the behaviour suggested it was not done at random. These criteria split the untrained bees into 2 groups: learners and non-learners: those who had met criterion and those who had not.

Learner proficiency
For the single-demonstrator diffusion experiments, in order to compare learners across colonies, individual proficiency indices (p i ) were calculated for each learner as follows: where O i represented the total opening incidence by the individual in question, d l represented the total of days spent as a learner (inclusive of the day learning criteria was met), d the length of the diffusion in days, and x the day learning criteria was met. This allowed comparison between learners that met criteria at different points in the diffusion, e.g., a bee that learned on the second day may have accumulated more openings than a bee that learned on the fifth day not due to proficiency; simply by means of having more time. As 2 distinct clusters of learners emerged during the single-demonstrator diffusion experiments: those that performed the target behaviour repeatedly and often, and those that did so rarely and sporadically (in some cases, only performing the behaviour the 2 times required to meet learning criteria), a simpler method was used to classify learners based on their proficiency in the multiple-demonstrator diffusion experiments. Bees that performed box-opening behaviour (of either variant) >10 times were considered proficient learners, while those that performed the behaviour �10 times were considered non-proficient learners.

Learner preference
In the multiple-demonstrator diffusion experiments, learners were classed as having a preference for the red variant, a preference for the blue variant, or no preference, based on the proportion of each behavioural variant that they performed throughout the experiment. The boundaries were as follows: red preference, >60% red variant; blue preference, >60% blue variant; no preference, no variant >60% (so, both variants being �60% and �40%).

Statistical analysis
Data were analysed with R 4.0.3 [48] and figures were produced using ggplot2 [49]. Due to non-normally distributed data, Wilcoxon signed-rank tests were used to compare paired samples and Mann-Whitney U tests were used to compare unpaired samples.
When comparing individual learner proficiency (p i ) between "experimental" and "control" colonies, there was no significant difference in p i between "red colonies" and "blue colonies," allowing these data to be pooled into a single "experimental" group (W = 84.5, p = 0.268; Mann-Whitney U test). Spearman's rank order correlation tests were used to analyse the relationship between experimental day and incidence of box opening by observers.
Two linear mixed-effects models were also used to analyse the data, using the R packages lme4 [50] and lmertest [51]. The first analysed the effect of demonstrator presence on learner proficiency over time and included 2 categorical fixed effects: 1 between-subjects factor "treatment" (experimental, control) and 1 within-subjects factor "day" (day 1, day 3; where day 1 was the day an individual met the learning criteria and day 3 was 2 days following this). Individuals that learned too late in the diffusion to have any data recorded for day 3 were excluded (e.g., a bee that met criteria on day 5 or 6 in the 6-day diffusion experiments or day 11 or 12 of the 12-day diffusion experiments; n = 4 from the experimental colonies and n = 3 from the control colonies, leaving n = 18 and n = 11 in each group, respectively). The response variable was box-opening incidence. There was no significant difference between the learners from experimental colonies seeded with "red-pushing demonstrators" and those seeded with "bluepushing demonstrators" on either day 1 (independent samples Mann-Whitney U test; W = 55, p = 0.1440) or day 3 (independent samples Mann-Whitney U test; W = 35.5, p = 0.4175), so these were pooled into a single experimental group.
The second model analysed the effect of time on individual learner preferences and included 1 categorical fixed effect: 1 within-subjects factor "day" (day 1, day x; where day 1 was the day an individual met learning criteria and day x was the last day they were recorded performing box opening). Individuals only active on 1 day (n = 1) were excluded, leaving n = 12 individuals (n = 8 from population 1R2B2 and n = 4 from population 2R2B2). The response variable was the proportion of box-opening behaviour that was of the blue-pushing behavioural variant.
We used the Akaike information criterion (AIC) to compare 3 variations of each model: using bee ID as the sole random factor, using colony/population ID as the sole random factor, and using bee ID nested within colony/population ID, to find the version with the best fit (see S2 and S8 Tables). In each case, the model with the lowest AIC was chosen. For all comparisons, P < 0.05 was considered to indicate a statistically significant difference.